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Systems and methods for organising information relat- 
ing to a study of polymorphisms. A database model (102) is 
provided which interralates information about one or more of, 
e.g. subjects (112) from whom samples (114) are extracted, 
primers used in extracting the DNA from the subjects, about 
ihc samples themselves, about experiments done on samples, 
about particular oligonucleotide probe arrays used to per- 
form experiments, about analysis procedures performed on 
the samples, and about analysis results. The model is read- 
ily translatable into database languages such as SQL. The 
database model scales to permit storage of information about 
large numbers of subjects, samples, experiments, chips, etc. 
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SYSTEM FOR PROVIDING A POLYMORPHISM DATABASE 



fP QS S -B. F.FERENCF TO RELATF.D APPLICATIONS 

The present application claims priority from U.S. Prov. App. No. 
60/053,842 filed July 25, 1997, entitled COMPREHENSIVE BIO-INFORMATICS 
DATABASE, from U.S. Prov. App. No. 60/069,198 filed on December 11, 1997, 
entitled COMPREHENSIVE DATABASE FOR BIOINFORMATICS , and from U.S. 

5 Prov. App. No. 60/069,436, entitled GENE EXPRESSION AND EVALUATION 
SYSTEM, filed on December 11, 1997. The contents of all three provisional 
applications are herein incorporated by reference. 

The subject matter of the present application is related to the subject 
matter of the following three co-assigned applications filed on the same day as the 

10 present application. GENE EXPRESSION AND EVALUATION SYSTEM (Attorney 
Docket No. 018547-035010), METHOD AND APPARATUS FOR PROVIDING A 
BIOINFORMATICS DATABASE (Attorney Docket No. 018547-033810), METHOD 
AND SYSTEM FOR PROVIDING A PROBE ARRAY CHIP DESIGN DATABASE 
(Attorney Docket No. 018547-033830). The contents of these three applications are 

15 herein incorporated by reference. 

BACKGROUND OF THE INVENTION 
The present invention relates to the collection and storage of information 
pertaining to chips for processing biological samples and thereby identifying 
polymorphisms. 

20 The genomes of all organisms undergo spontaneous mutation in the course 

of their continuing evolution generating variant forms of progenitor sequences (Gusella, 
Ann. Rev. Biochem. 55, 831-854 (1986)). The variant form may confer an evolutionary 
advantage or disadvantage relative to a progenitor form or may be neutral. In some 
instances, a variant form confers a lethal disadvantage and is not transmitted to 

25 subsequent generations of the organism. In other instances, a variant form confers an 
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evolutionary advantage to the species and is eventually incorporated" into the DNA of 
many or most members of the species and effectively becomes the progenitor form. In 
many instances, both progenitor and variant form(s) survive and co-exist in a species 
population. The coexistence of multiple forms of a sequence gives rise to 

5 polymorphisms. 

Despite the increased amount of nucleotide sequence data being generated 
in recent years, only a minute proportion of the total repository of polymorphisms in 
humans and other organisms has so far been identified. The paucity of polymorphisms 
hitherto identified is due to the large amount of work required for their detection by 

10 conventional methods. For example, a conventional approach to identifying 

polymorphisms might be to sequence the same stretch of oligonucleotides in a population 
of individuals by dideoxy sequencing. In this type of approach, the amount of work 
increases in proportion to both the length of sequence and the number of individuals in a 
population and becomes impractical for large stretchesof DNA or large numbers of 

15 persons. 

Devices and computer systems for forming and using arrays of materials 
on a substrate have been developed. These devices and systems have been used for 
identifying polymorphisms. For example, PCT application WO92/10588, incorporated 
herein by reference for all purposes, describes techniques for sequencing or sequence 

20 checking nucleic acids and other materials. Arrays for performing these operations may 
be formed in arrays according to the methods of, for example, the pioneering techniques 
disclosed in U.S. Patent No. 5,143,854 and U.S. Patent No. 5,571,639, both 
incorporated herein by reference for all purposes. 

According to one aspect of the techniques described therein, an array of 

25 nucleic acid probes is fabricated at known locations on a chip or substrate. A 

fluorescently labeled nucleic acid is then brought into contact with the chip and a scanner 
generates an image file indicating the locations where the labeled nucleic acids bound to 
the chip. Based upon the identities of the probes at these locations, it becomes possible 
to extract information such as the identity of polymorphic forms in of DNA or RNA. 

30 Such systems have been used to form, for example, arrays of DNA that may be used to 
study and detect mutations relevant to cystic fibrosis, the P53 gene (relevant to certain 
cancers), HIV, and other genetic characteristics. 
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It would be highly useful to apply such arrays to the study of 
polymorphisms on a large scale. For example, it would be useful to conduct large scale 
studies on the correlation between certain polymorphisms and individual characteristics 
such as susceptibility to diseases and effectiveness of drug treatments. To achieve these 

5 benefits, it is contemplated that the operations of chip design, construction, sample 

preparation, and analysis will occur on a very large scale. The quantity of information 
related to each of these steps to store and correlate is vast. For large scale 
polymorphism studies, it will be necessary to store this information in a way to facilitate 
later advantageous querying and retrieval. What is needed is a system and method 

10 suitable for storing and organizing large quantities of information used in conjunction 
with polymorphism studies. 

SUMMARY OF THE INVENTION 
The present invention provides systems and methods for organizing 
information relating to study of polymorphisms. A database model is provided which 

15 interrelates information about one or more of, e.g, subjects from whom samples are 
extracted, primers used in extracting the DNA from the subjects, about the samples 
themselves, about experiments done on samples, about particular oligonucleotide probe 
arrays used to perform experiments, about analysis procedures performed on the 
samples, and about analysis results . The model is readily translatable into database 

20 languages such as SQL. The database model scales to permit storage of information 
about large numbers of subjects, samples, experiments, chips, etc. 

Applications include linkage studies to determine resistance to drugs, 
susceptibility to diseases, and study of every characteristic of humans and other 
organisms that is related genetic variability. Another application of a database 

25 constructed according to this model is quality control of the various steps of performing a 
polymorphism study. By preserving information about every step of a polymorphism 
study, one can assess the reliability of the results or use the preserved information as 
feedback to improve procedures. 

A further understanding of the nature and advantages of the inventions 

30 herein may be realized by reference to the remaining portions of the specification and the 
attached drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 illustrates an overall system and process for forming and analyzing 
arrays of biological materials such as DNA or RNA. 

Fig. 2 A illustrates a computer system suitable for use in conjunction with 

5 the overall system of Fig. 1. 

Fig. 2B illustrates a computer network suitable for use in conjunction with 

the overall system of Fig. 1. 

Fig. 3 illustrates a key for interpreting a database model. 
Figs. 4A-4H illustrate a database model for maintaining information for 
10 the system and process of Fig. 1 according to one embodiment of the present invention. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 
Investigation of Polymorphisms 

A. Preparation of Samples 

Polymorphisms are detected in a target nucleic acid from an individual 
15 being analyzed. For assay of genomic DNA, virtually any biological sample (other than 
pure red blood cells) is suitable. For example, convenient tissue samples include whole 
blood, semen, saliva, tears, urine, fecal material, sweat, buccal, skin and hair. For 
assay of cDNA or mRNA, the tissue sample must be obtained from an organ in which 
the target nucleic acid is expressed. For example, if the target nucleic acid is a 
20 cytochrome P450, the liver is a suitable source. 

Many of the methods described below require amplification of DNA from 
target samples. This can be accomplished by e.g., PCR. See generally PCR 
Technology: Principles and Applications for DNA Amplification (ed. H.A. Erlich, 
Freeman Press, NY, NY, 1992); PCR Protocols: A Guide to Methods and Applications 
25 (eds. Innis, et al., Academic Press, San Diego, CA, 1990); Manila et al., Nucleic Acids 
Res. 19, 4967 (1991); Eckert et al., PCR Methods and Applications 1, 17 (1991); PCR 
(eds. McPherson et al., IRL Press, Oxford); and U.S. Patent 4,683,202 (each of which 
is incorporated by reference for all purposes). 

Other suitable amplification methods include the ligase chain reaction 
30 (LCR) (see Wu and Wallace, Genomics 4, 560 (1989), Landegren et al., Science 241, 
1077 (1988), transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA 86, 
1173 (1989)), and self-sustained sequence replication (Guatelli et al., Proc. Nat. Acad. 
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Sri. USA, 87, 1874 (1990)) and nucleic acid based sequence amplification (NASBA). 
The latter two amplification methods involve isothermal reactions based on isothermal 
transcription, which produce both single stranded RNA (ssRNA) and double stranded 
DNA (dsDNA) as the amplification products in a ratio of about 30 or 100 to 1, 
5 respectively. 

B. Detection of Polymorphisms in Target DNA 

There are two distinct types of analysis depending whether a 
polymorphism in question has already been characterized. The first type of analysis is 
sometimes referred to as de novo characterization. This analysis compares target 

10 sequences in different individuals to identify points of variation, i.e., polymorphic sites. 
By analyzing groups of individuals representing the greatest ethnic diversity among 
humans and greatest breed and species variety in plants and animals, patterns 
characteristic of the most common alleles/haplotypes of the locus can be identified, and 
the frequencies of such populations in the population determined. Additional allelic 

15 frequencies can be determined for subpopulations characterized by criteria such as 

geography, race, or gender. The second type of analysis is determining which form(s) of 
a characterized polymorphism are present in individuals under test. There are a variety 
of suitable procedures, which are discussed in turn. 

1. Allele-Specific Probes 

The design and use of allele-specific probes for analyzing polymorphisms 
is described by e.g., Saiki et al., Nature 324, 163-166 (1986); Dattagupta, EP 235,726, 
Saiki, WO 89/11548. Allele-specific probes can be designed that hybridize to a segment 
of target DNA from one individual but do not hybridize to the corresponding segment 
from another individual due to the presence of different polymorphic forms in the 
respective segments from the two individuals. Hybridization conditions should be 
sufficiently stringent that there is a significant difference in hybridization intensity 
between alleles, and preferably an essentially binary response, whereby a probe 
hybridizes to only one of the alleles. Some probes are designed to hybridize to a 
segment of target DNA such that the polymorphic site aligns with a central position 
(e.g., in a 15 mer at the 7 position; in a 16 mer, at either the 8 or 9 position) of the 
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probe. This design of probe achieves good discrimination in hybridization between 
different allelic forms. 

Allele-specific probes are often used in pairs, one member of a pair 
showing a perfect match to a reference form of a target sequence and the other member 
5 showing a perfect match to a variant form. Several pairs of probes can then be 

immobilized on the same support for simultaneous analysis of multiple polymorphisms 
within the same target sequence. 

2. Tiling Arrays 

The polymorphisms can also be identified by hybridization to nucleic acid 
10 arrays, some example of which are described by WO 95/11995 (incorporated by 

reference in its entirety for all purposes). WO 95/11995 also describes subarrays that 
are optimized for detection of a variant forms of a precharacterized polymorphism. Such 
a subarray contains probes designed to be complementary to a second reference 
sequence, which is an allelic variant of the first reference sequence. The second group 
15 of probes is designed by the same principles as described in the Examples except that the 
probes exhibit complementarily to the second reference sequence. The inclusion of a 
second group (or further groups) can be particular useful for analyzing short 
subsequences of the primary reference sequence in which multiple mutations are expected 
to occur within a short distance commensurate with the length of the probes (i.e., two or 
20 more mutations within 9 to 21 bases). 

3. Allele-Specific Primers 

An allele-specific primer hybridizes to a site on target DNA overlapping a 
polymorphism and only primes amplification of an allelic form to which the primer 
exhibits perfect complementarily. See Gibbs, Nucleic Acid Res. 17, 2427-2448 (1989). 

25 This primer is used in conjunction with a second primer which hybridizes at a distal site. 
Amplification proceeds from the two primers leading to a detectable product signifying 
the particular allelic form is present. A control is usually performed with a second pair 
of primers, one of which shows a single base mismatch at the polymorphic site and the 
other of which exhibits perfect complementarily to a distal site. The single-base 

30 mismatch prevents amplification and no detectable product is formed. The method works 
best when the mismatch is included in the 3'-most position of the oligonucleotide aligned 
with the polymorphism because this position is most destabilizing to elongation from the 
primer. See, e.g., WO 93/22456. 
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4. Direct-Sequencing 

The direct analysis of the sequence of polymorphisms of the present 
invention can be accomplished using either the dideoxy chain termination method or the 
Maxam Gilbert method (see Sambrook et al., Molecular Cloning, A Laboratory Manual 
(2nd Ed., CSHP, New York 1989); Zyskind et al., Recombinant DNA Laboratory 
Manual, (Acad. Press, 1988)). 

5. Denaturing Gradient Gel Electrophoresis 

Amplification products generated using the polymerase chain reaction can 
be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can 
be identified based on the different sequence-dependent melting properties and 
electrophoretic migration of DNA in solution. Erlich, ed., PCR Technology, Principles 
and Applications for DNA Amplification, (W.H. Freeman and Co, New York, 1992), 
Chapter 7. 

6. Single-Strand Conformation Polym orphism Analysis 
Alleles of target sequences can be differentiated using single-strand 

conformation polymorphism analysis, which identifies base differences by alteration in 
electrophoretic migration of single stranded PCR products, as described in Orita et al., 
Proc. Nat. Acad. ScL 86, 2766-2770 (1989). Amplified PCR products can be generated 
as described above, and heated or otherwise denatured, to form single stranded 
amplification products. Single-stranded nucleic acids may refold or form secondary 
structures which are partially dependent on the base sequence. The different 
electrophoretic mobilities of single-stranded amplification products can be related to base- 
sequence difference between alleles of target sequences. 

Biological Material Analysis System 

One embodiment of the present invention operates in the context of a 
system for analyzing biological or other materials using arrays that themselves include 
probes that may be made of biological materials such as RNA or DNA. The VLSIPS™ 
and GeneChip™ technologies provide methods of making and using very large arrays of 
polymers, such as nucleic acids, on chips. See U.S. Patent No. 5,143,854 and PCT 
Patent Publication Nos. WO 90/15070 and 92/10092, each of which is hereby 
incorporated by reference for all purposes. Nucleic acid probes on the chip are used to 

7 
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detect complementary nucleic acid sequences in a sample nucleic acid of interest (the 
"target" nucleic acid). 

Fig. 1 illustrates an overall system 100 for forming and analyzing arrays 
of biological materials such as RNA or DNA. A part of system 100 is a polymorphism 
5 database 102. Polymorphism database 102 includes information about, e.g., biological 
sources, preparation of samples, design of arrays, raw data obtained from applying 
experiments to chips, analysis procedures applied, and analysis results, etc. 
Polymorphism database 102 facilitates large scale study of polymorphisms. 

A chip design system 104 is used to design arrays of polymers such as 

10 biological polymers such as RNA or DNA. Chip design system 104 may be, for 
example, an appropriately programmed Sun Workstation or personal computer or 
workstation, such as an IBM PC equivalent, including appropriate memory and a CPU. 
Chip design system 104 obtains inputs from a user regarding chip design objectives 
including polymorphisms of interest, and other inputs regarding the desired features of 

15 the array. Optionally, chip design system 104 from external databases such as GenBank. 
The output of chip design system 104 is a set of chip design computer files in the form 
of, for example, a switch matrix, as described in PCT application WO 92/10092, and 
other associated computer files. The chip design computer files form a part of 
polymorphism database 102. Systems for designing chips for study of polymorphisms 

20 are disclosed in U.S. Patent No. 5,571,639 and in PCT application WO 95/11995, the 
contents of which are herein incorporated by reference. 

The chip design files are input to a mask design system (not shown) that 
designs the lithographic masks used in the fabrication of arrays of molecules such as 
DNA. The mask design system designs the lithographic masks used in the fabrication of 

25 probe arrays. The mask design system generates mask design files that are then used by 
a mask construction system (not shown) to construct masks or other synthesis patterns 
such as chrome-on-glass masks for use in the fabrication of polymer arrays. 

The masks are used in a synthesis system (not shown). The synthesis 
system includes the necessary hardware and software used to fabricate arrays of polymers 

30 on a substrate or chip. The synthesis system includes a light source and a chemical flow 
cell on which the substrate or chip is placed. A mask is placed between the light source 
and the substrate/chip, and the two are translated relative to each other at appropriate 
times for deprotection of selected regions of the chip. Selected chemical reagents are 

8 
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directed through the flow cell for coupling to deprotected regions, as well as for washing 
and other operations. The substrates fabricated by the synthesis system are optionally 
diced into smaller chips. The output of the synthesis system is a chip ready for 
application of a target sample. 

Information about the mask design, mask construction, and probe array 
synthesis is presented by way of background. A biological source 112 is, for example, 
tissue from a plant or animal. Various processing steps are applied to material from 
biological source 112 by a sample preparation system 114. Operation of sample 
preparation system 114 in the context of a polymorphism study is discussed below in 
further detail. 

The prepared samples include nucleic acid sequences such as DNA. 
When the sample is applied to the chip by a sample exposure system 116, the nucleic 
acids may or may not bond to the probes. The nucleic acids can be tagged with 
fluorescein labels to determine which probes have bonded to nucleotide sequences from 
the sample. The prepared samples will be placed in a scanning system 118. Scanning 
system 118 includes a detection device such as a confocal microscope or CCD (charge- 
coupled device) that is used to detect the location where labeled receptors have bound to 
the substrate. The output of scanning system 118 is an image file(s) indicating, in the 
case of fluorescein labeled receptor, the fluorescence intensity (photon counts or other 
related measurements, such as voltage) as a function of position on the substrate. These 
image files may also form a part of polymorphism database 102. Since higher photon 
counts will be observed where the labeled nucleic acid(s) has bound more strongly to the 
array of probes, and since the monomer sequence of the probes on the substrate is known 
as a function of position, it becomes possible to analize the sequence(s) of the nucleic 
acid(s) that are complementary to the probes. 

The image files and the design of the chips are input to an analysis system 
120 that, e.g., calls bases. Such analysis techniques are described in EPO Pub. No. 
07171 13A, the contents of which are herein incorporated by reference. 

Chip design system 104, analysis system 120 and control portions of 
exposure system 116, sample preparation system 114, and scanning system 118 may be 
appropriately programmed computers such as a Sun workstation or IBM-compatible PC. 
An independent computer for each system may perform the computer-implemented 
functions of these systems or one computer may combine the computerized functions of 
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two or more systems. One or more computers may maintain chip design database 102 
independent of the computers operating the systems of Fig. 1 or chip design database 102 
may be fully or partially maintained by these computers. 

Fig. 2 A depicts a block diagram of a host computer system 10 suitable for 
5 implementing the present invention. Host computer system 210 includes a bus 212 

which interconnects major subsystems such as a central processor 214, a system memory 
216 (typically RAM), an input/output (I/O) adapter 218, an external device such as a 
display screen 224 via a display adapter 226, a keyboard 232 and a mouse 234 via an 
I/O adapter 218, a SCSI host adapter 236, and a floppy disk drive 238 operative to 

10 receive a floppy disk 240. SCSI host adapter 236 may act as a storage interface to a 
fixed disk drive 242 or a CD-ROM player 244 operative to receive a CD-ROM 246. 
Fixed disk 244 may be a pan of host computer system 210 or may be separate and 
accessed through other interface systems. A network interface 248 may provide a direct 
connection to a remote server via a telephone link or to the Internet. Network interface 

15 248 may also connect to a local area network (LAN) or other network interconnecting 
many computer systems. Many other devices or subsystems (not shown) may be 
connected in a similar manner. 

Also, it is not necessary for all of the devices shown in Fig. 2 A to be 
present to practice the present invention, as discussed below. The devices and 

20 subsystems may be interconnected in different ways from that shown in Fig. 2A. The 

operation of a computer system such as that shown in Fig. 2 A is readily known in the art 
and is not discussed in detail in this application. Code to implement the present 
invention, may be operably disposed or stored in computer-readable storage media such 
as system memory 216, fixed disk 242, CD-ROM 246, or floppy disk 240. 

25 Fig. 2B depicts a network 260 interconnecting multiple computer systems 

210. Network 260 may be a local area network (LAN), wide area network (WAN), etc. 
Bioinformatics database 102 and the computer-related operations of the other elements of 
Fig. 2B may be divided amongst computer systems 210 in any way with network 260 
being used to communicate information among the various computers. Portable storage 

30 media such as floppy disks may be used to carry information between computers instead 
of network 260. 
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Overall Description of Database 

Polymorphism database 102 is preferably a relational database with a 
complex internal structure. The structure and contents of chip design database 102 will 
be described with reference to a logical model depicted in Figs. 4A-4H that describes the 
5 contents of tables of the database as well as interrelationships among the tables. A visual 
depiction of this model will be an Entity Relationship Diagram (ERD) which includes 
entities, relationships, and attributes. A detailed discussion of ERDs is found in "ERwin 
version 3.0 Methods Guide" available from Logic Works, Inc. of Princeton, NJ, the 
contents of which are herein incorporated by reference. Those of skill in the art will 

10 appreciate that automated tools such as Developer 2000 available from Oracle will 

convert the ERD from Figs. 4A-4H directly into executable code such as SQL code for 
creating and operating the database. 

Fig. 3 is a key to the ERD that will be used to describe the contents of 
chip design database 102. A representative table 302 includes one or more key attributes 

15 304 and one or more non-key attributes 306. Representative table 302 includes one or 
more records where each record includes fields corresponding to the listed attributes. 
The contents of the key fields taken together identify an individual record. In the ERD, 
each table is represented by a rectangle divided by a horizontal line. The fields or 
attributes above the line are key while the fields or attributes below the line are non-key. 

20 An identifying relationship 308 signifies that the key attribute of a parent table 310 is 

also a key attribute of a child table 312. A non-identifying relationship 314 signifies that 
the key attribute of a parent table 316 is also a non-key attribute of a child table 318. 
Where (FK) appears in parenthesis, it indicates that an attribute of one table is a key 
attribute of another table. Both the depicted non-identifying and identifying relationship 

25 are one to one-or-more relationships where one record in the parent table corresponds to 
one or more records in the child table. An alternative non-identifying relationship 324 is 
a one to zero-or-more relationship where one record in a parent table 320 corresponds to 
zero or more records in a child table 322. 

Database Model 

30 Figs. 4A-4H are entity relationship diagrams (ERDs) showing elements of 

polymorphism database 102 according to one embodiment of the present invention. Each 

11 
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rectangle in the diagram corresponds to a table in database 102. First, the relationships 
and general contents of the various tables will be described. 

The interrelationships and general contents of the tables of database 102 
will be described first. Then a chart will be presented listing and describing all of the 
5 fields of the various tables. 

Fig. 4 A illustrates core elements of database 102 according to one 
embodiment of the present invention. A subject table 402 lists organisms from which 
samples have been extracted for polymorphism analysis or other tissue sources. Samples 
may also be obtained from tissue collections not associated with any one identified 

10 organism. Information stored within subject table 402 includes the name, gender, 

family/ position with family, (e.g., father , mother, etc.), and ethnic group. For human 
subjects, the name and family will preferably be represented in coded form to assure 
privacy. Associated with each subject is a species as listed in a species table 404. Also, 
a relationship may be defined among subjects a subject relationship table 406 which 

15 includes records corresponding to related subjects. These relationships may be father- 
mother, sibling, twins, etc. Subjects may be part of a group that is being studied, e.g., 
a group with a congenital disease, or a toxic reaction to a particular drug. The groups 
are listed in a subject group table 408. Participation of subjects in groups is defined by a 
subject participation table 410 which lists all group memberships. 

20 Samples and their attributes are listed in a sample table 412. Each sample 

has an associated sample type! The sample types are listed in a sample type table 414. 
Possible sample types include blood, urine, etc. Companies or institutions that provide 
samples are listed in a sample source table 416. 

Database 102 provides an item table 418 that includes records for items. 

25 There are various types of items that correspond to different stages of the sample 

preparation process. An "item derivation" transforms an item of one type into an item of 
another type. The following table lists various item types and item derivation types for 
a representative embodiment. 

Derived from bv Item Derivation Type 

other samples pooling 
other sample splitting 
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Item Type 

30 Sample 
Sample 



WO 99/05324 
Extracted DNA 
Target (Sequences of 
interest amplified) 
Fluorescently Labeled 
Target 

Hybridized Chip 



Sample 



Extracted DNA 



Target 

Labeled Target 



Stained Hybridized Chip Hybridized Chip 



PCT/US98/15458 

DNA Extraction 
PCR 



Labeling 

Hybridization (application of 

target to chip) 

Staining 



Item derivations are listed in an item derivation table 420. It should be noted that 
derivations need not produce a change between item types. Each item derivation occurs 
in accordance with a protocol that characterizes the step or steps in the derivation. 
Protocols are listed in a protocol table 428. Each item derivation is performed by an 
employee listed in employee table 432. 

Unused chips are listed in a chip table 422. Hybridized chips (i.e., chips 
that have had target applied) are listed in a hybridized chip table 424. A hybridized 
sample map table 426 lists the relationships between hybridized chips and the samples 
that have been applied to them. 

Stained hybridized chips are scanned in a process referred to here as a 
scan experiment. Scan experiments are listed in a scan experiment table 430. The scan 
experiment occurs in accordance with a protocol listed in protocol table 428. The scan 
experiment is performed by an employee listed in employee table 432. 

Fig. 4B depicts further details of the data model for items and item 
derivations. The various item types are listed in an item type table 434 and the various 
item derivation types are listed in an item derivation type table 436. The relationships 
between successive item types, e.g., sample and target are defined in an item type 
derivation table 438. An item has associated attributes. For example, for a target, 
database 102 may store the concentration, volume, location and/or remaining amount. 
All item attributes are stored in an item attribute table 440. Item attributes may be 
shared among multiple items. For example, a series of targets may all share a 
preparation date. An item attribute item map table 442 implements a many-to-many 
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relationship between item attributes and items. The various types of item attributes such 
as preparer, preparation date, etc. are listed in an item attribute type table 444. Each 
item type has corresponding attribute types. Some attribute types are, however, shared 
among various item types. Accordingly, there is a many-to-many relationship among 
item attribute types and item types that is implemented by an item type map table 446. 

The tables of Fig. 4B represent a powerfully general model of the sample 
preparation process. Changes in process steps that require changes in the type of 
information that should be stored may be implemented by changing and adding table 
contents rather than providing new tables or changing relationships among tables. 

Fig. 4C depicts a detailed data model for storing information about 
protocols according to the present invention. Protocols as stored in protocol table 428 
represent information about particular processes that have been performed including item 
derivations, analyses, and scan experiments. Each protocol has an associated protocol 
template. Protocol templates identify protocol types. For example, one protocol 
template may be a PCR template. All protocols associated with the PCR template 
identify parameters for performing a PCR procedure. Protocol templates are listed in a 
protocol template table 448. A parameter table 450 lists all the parameters and then- 
values for all the protocols listed in protocol table 428. A parameter template table 452 
lists the various parameter types along with default values. An examples of a parameter 
template would be a PCR reaction temperature. The parameter template would include a 
default value for this parameter. Parameter table 450 might then list many different PCR 
reaction temperature values that would be used by many different protocols. If a 
parameter value has not been modified by the user, it inherits the standard value of the 
associated parameter template. A parameter template set is a set of parameter templates 
that are used for a particular purpose, e.g., in association with protocols according to 
one or more protocol templates. Parameter template sets are listed in a parameter 
template set table 454. There are different types of parameter template set and these are 
listed in a parameter template set table 456. A mapping between parameter template 
sets and protocol templates is defined by a protocol template set map table 458. 

Protocol templates may have associated lengthy verbal information about 
how to perform protocol steps. A protocol template document table 460 stores 
references to documents that include instructions for performing protocols. 



14 



SUBSTITUTE SHEET (RULE 26) 



WO 99/05324 



PCTAJS98/15458 



As with the items, the data model for protocols defined by Fig. 4C is 
highly general and allows significant changes in the way item derivations, analyses, and 
experiments are performed without changing the underlying data model. 

Referring again to Fig. 4A, there are tables to record information 
5 concerning the use of primers in PCR. A fragment table 462 lists all the sequence 

fragments investigated in conjunction with database 102. Associated with each fragment 
are one or more primer pairs used to amplify the fragment in a PCR process. A primer 
pair table 464 lists all the primer pairs including information about whether the primer 
pair actually worked to amplify the fragment. In order to develop the information about 

10 the effectiveness of primer pairs, there is a PCR table 466 that lists records identifying 
the outcome of multiple PCR operations. The individual PCR operations are identified 
by reference to item derivation table 420. 

A single PCR operation may be used to amplify many different fragments 
and thus employ many different primer pairs. Of course, a single primer pair may be 

15 used in multiple PCR operations. There is therefore a many-to-many relationship 

between PCR operations and primer pairs that is recorded by a primer pair PCR map 
table 468. Information about individual primers is stored in a primer table 470. Also, 
each primer has an associated protocol in protocol table 428 that characterizes the primer 
preparation process. Information about primer orders is listed in a primer order table 

20 472. Each primer order is to a vendor and the vendors are listed in a vendor table 474. 

Each primer order is made by an employee listed in employee table 432. A primer order 
design map table 476 implements a many-to-many relationship between primer orders 
and primers. 

The data model described here thus preserves information about primers 
25 used in PCR reactions. One can improve results by using primers that have successfully 
amplified a given fragment in the past. Sometimes particular groups of primer pairs 
cannot be multiplexed together in the same PCR process. The information preserved 
here thus permits experimenters to make optimal use of expensive and time consuming 
PCR procedures. 

30 It is also useful to preserve information about the chip production process 

and the origin of individual chips. A wafer table 478 lists wafers. When chips are 
produced, many chips are produced at the same time as part of a single wafer. Chip 
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table 422 stores references to wafer table 478 for each chip and the location of each chip 
on its wafer at production time. Sometimes there is analytic significance associated with 
the location of a chip on the wafer. Each wafer is produced as part of a lot and the 
identify of the lot for each wafer is recorded by wafer table 478 as a reference to a lot 
5 table 480 that lists each lot. 

Fig. 4D depicts further details of tables pertaining to chip design that are 
preferably maintained within polymorphism database 102 according to one embodiment 
of the present invention. A tiling design table 482 lists tiling designs. Each tiling design 
represents the application of a particular tiling format to a sequence to be investigated. 

10 Tiling formats indicate probe orientation, probe length, and the position within a probe 
of a single nucleotide polymorphism being investigated. In a preferred embodiment, 
there may be very few tiling formats and they are listed in a tiling format table 484. 

A particular tiling design includes many atom designs specifying the design 
of a single atom. In one embodiment, an atom is a group of typically four probes used 

15 to investigate a single base position with each probe hybridizing to a sequence including 
a different base at that position. Atom designs are listed in an atom design table 486. 
Records identifying the designs of individual probes are listed in a probe design table 
488. A probe design role table 490 indicates the roles of probes listed in probe design 
table 488 in the atom designs of atom design table 486. For combinations of probe 

20 design and atom design, probe design role table 490 indicates which base the probe 

hybridizes to at the substitution position and whether the probe represents a match or a 
mismatch to the wild type. 

A probe data table 492 gives the hybridization intensity values for 
particular probes designs as determined in particular scan experiments. Each record of 

25 the table also gives the number of pixels used to determine the intensity value and the 
standard deviation of intensity as measured among the pixels. 

Figs. 4E-4G depict aspects of polymorphism database 102 related to 
analysis procedures and their results according to one embodiment of the present 
invention. An analysis table 494 lists analyses performed. An analysis generally refers 

30 to a non-trivial transformation of data. Records of analysis table 494 include references 
to protocol table 428 to specify parameters used for each analysis. Analyses may take as 
their input raw data or the results of previous analyses. An analysis dependency table 
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496 lists dependencies among analyses where one analysis depends on the data developed 
by another analysis. An analysis input table 498 lists inputs for analyses listed in 
analysis table 494. 

On the right side of Fig. 4E are various tables used to support analyses. 
5 A chip design sequence map table 500 correlates particular fragments with chip designs, 
A sequence position table 502 lists investigated sequence positions indicating their 
positions on a fragment. Records of sequence position table 502 reference a genomic 
sequence position table 504 which gives sequence positions in the genome rather than 
within individual fragments. 

10 A scan experiment set table 506 lists sets of scan experiments. This 

allows for groupings of experiments for individuals or populations to serve as the basis 
for polymorphism analysis. A scan experiment used table 508 lists records indicating 
memberships of a scan experiment in a scan experiment set. 

A tiling data table 510 lists records identifying tiling designs as 

15 implemented in particular chips measured by particular scan experiments. An atom data 
table 512 lists the intensities measured for particular sequence positions as measured in 
scan experiments identified by the tiling data records. A subject sequence position data 
table 514 lists combinations of sequence position and scan experiment. 

A series of tables in Figs. 4E-4G correspond to different types of analysis 

20 that occur during the course of a polymorphism investigation. The types presented here 
are merely representative. A parallel series of tables provide the analysis results. A 
polymorphism analysis table 516 lists references to analysis table 494. The results of the 
performed polymorphism analyses are listed in a polymorphism position result table 518. 
A record of this table gives a result for a polymorphism analysis for a particular position 

25 as determined based on a particular set of scan experiments. In one embodiment the 
result is whether a particular mutation is certain, likely, possible, or not possible at the 
position. The result may also be that the reference is wrong. 

A user polymorphism analysis table 520 lists user interpretations of results 
as listed in polymorphism position result table 518. The records of user polymorphism 

30 analysis table 520 are references to analysis table 494. The user interpretations 

themselves are stored in a user polymorphism analysis result table 522. Each result is a 
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likelihood of a particular mutation at a position as considered by a user plus an 
accompanying user comment. 

A P-Hat analysis estimates the relative concentrations of wild type 
sequence and sequence having a particular mutation as determined in a particular scan 
5 experiment. A P-Hat analysis table 524 lists references to analysis table 494. An atom 
result table 526 gives estimates of the relative concentration along with upper and lower 
bounds and a maximum intensity. For heterozygous mutations, the estimates of relative 
concentration will cluster around 0.5 For homozygous mutations, the estimates should 
cluster around 1.0. 

10 Base call analyses are determinations of the base at a particular position 

for a particular individual that may be based on more than one experiments. A base call 
analysis table 528 lists references to analysis table 494. A base call result table 530 lists 
the called bases for particular combinations of sequence position and subject. 

A P-Hat grouping analysis determines a measure of likelihood that data in 

15 a set of scan experiments results from separate genotypes. P-hat grouping analyses are 
listed in a p-hat grouping analysis table 532 by reference to analysis table 494. P-hat 
grouping analysis results are listed in a mutation fraction result table 534. A group 
separation is given for various combinations of sequence position and scan experiment 
set. 

20 A clustering analysis determines an alternative measure of likelihood that 

data in a set of scan experiments results from separate genotypes. Clustering analyses 
are listed in a clustering analysis table 536 by reference to analysis table 494. Clustering 
analysis results are listed in a clustering result table 538. A clustering factor is given for 
various combinations of sequence position and scan experiment set. 

25 Fig. 4F shows tables which support normalization and footprint finding 

operations that support the analyses referred to in Fig. 4E. Hybridization intensity 
measurements made in scan experiments should be normalized over a set of scan 
experiments. The normalization should take into account differences in amplification 
level produced by different PCR processes. 

30 Normalization is done by region of sequence. A normalization region 

analysis determines the boundaries of a region to be normalized. The determination of 
boundaries takes into account that different fragments of sequence are amplified by 
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different PCR procedures. A normalization region analysis table 540 lists normalization 

region analyses by reference to analysis table 494. A normalization region result table 

542 lists the boundaries for each determined normalization region. 

Normalization values for identified normalization regions are themselves 
5 determined by normalization analyses. Normalization analyses are listed in a 

normalization analysis table 544 by reference to analysis table 494. A normalization 

result table 546 lists the normalization values for regions. 

A footprint analysis determines regions of sequence for which the 

hybridization intensity is elevated for the purposes of quality control. Footprint analyses 
10 are listed in a footprint analysis table 548 by reference to analysis table 494. Footprints 

are identified by sequence starting point and ending point in a particular scan experiment 

in a footprint table 550. 

Fig. 4G depicts tables pertaining to measurement quality according to one 

embodiment of the present invention. A tiling data quality analysis determines the 
15 quality of results from a scan experiment. These analyses are listed in a tiling data 

quality analysis table 552 by reference to analysis table 494. Tiling data quality analysis 

results are listed in a tiling data quality result table 554. The results include an average 

hybridization intensity value for perfect match or mismatch probes. A wild type call rate 

gives the fraction of atom data where the probe corresponding to the reference base has 
20 the highest hybridization intensity. A wild type call rate of around 1.0 indicates good 

quality. Where the call rate is less than 0.75, the scan experiment should be rejected. 

An accept data field indicates whether the analysis indicates rejection or acceptance. 

Where scan experiment measurements indicate two or more non-wild type 

bases within a probe length, this indicates a measurement problem for the affected region 
25 of sequence. These regions are identified by difficult region analyses listed in a difficult 

region analysis table 556 by reference to analysis table 494. A difficult region result 

table 558 lists the regions identified as being difficult. 

Analysis dependency table 496 indicates interrelationships among the 
various analyses of Figs. 4E-4G. A footprint analysis may depend on a normalization 
30 analysis which may in turn depend on a normalization region analysis. A basecall 

analysis or PHatGrouping analysis may depend on an atom analysis. A polymorphism 
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analysis may depend on any of these analyses and/or a user polymorphism analysis 
and/or a clustering analysis. 

Another aspect of the investigation of polymorphisms is seeking patent 
protection for identified polymorphisms. Fig. 4H shows tables of polymorphism 
database 102 related to efforts to seek patent protection according to one embodiment of 
the present invention. A polymorphism patent sequence table 560 lists sequences for 
which patent protection is sought. A patent application table 562 lists patent applications 
directed toward the protection of polymorphisms. A polymer patent application sequence 
map table 564 implements a many-to-many relationship between patent applications and 
sequences. A prior application table 566 lists relationships between patent applications 
and prior related patent applications. An attorney table 568 lists attorneys responsible 
for preparing patent applications listed in patent application table 562. A law firm table 
570 lists the law firms to which the attorneys listed in attorney table 568 belong. 

An employee group table 572 lists groups of inventors for the patent 
applications listed in table 562. Individual inventors are listed in employee table 432. 
An employee group map table 574 implements a many-to-many relationship between 
inventors and groups of inventors. 

The data model of Fig. 4H greatly facilitates the process of securing patent 
protection for polymorphisms and thereby increases the commercial incentive for 
investigation of polymorphisms. 

Database Contents 

The contents of the tables introduced above will now be presented in 
greater detail in the following chart. 



TABLE 


FIELD 


COMMENT 


tblSubject 
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TABLE 


FIELD 


COMMENT 




Subjectld: INTEGER 


Identifies 
biological 
source of 
sample. j 




SpeciesID:INTEGER 


Species of 
subject. 




Name :V ARCH AR2(20) 


Name of 
subject 
(anonimized 
for human 
subjects). 




Gender : V ARCHAR2(1 0) 

* 


Gender of 
subject. 




Family : V ARCH AR2(20) 


Family of 
subject 
(anonimized 
for human 
subjects). 




Member: SMALLINT 


Position in 
family (father, 
mother, etc.). 




Group_:VARCHAR2(20) 


Ethnic group. 




CellLineID:VARCHAR2(20) 


Identifier for 
sample source 
not associated 
with particular 
organism. 
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TABLE 


FIELD 


COMMENT 




TQppfprpnre-SMAT T TNT 


Whether or 
not subject is 
in a group. 


tblSpecies 








Speciesld : INTEGER 


Species 
identifier. 




Name : VARCH AR2(30) 


Name of 

species. 


SubjectRelationship 








Subject 1: INTEGER 


First subject in 
relationship. 




Subject2 : INTEGER 


Second subject 
in relationship. 




Position: VARCH AR2(2) 


Nature of 
relationship. 


tblSubjectGroup 








GroupId:INTEGER 


Identifier of 
group of 
subjects (not 
same as ethnic 
group). 




GroupCode:VARCHAR2(20) 


Code identifier 
for group. 




Comments:LONG VARCHAR 


User 

comments on 
group. 
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TABLE 


FIELD 


COMMENT 




upsize_ts:DATE 


Creation date 
for group. 


tblSubj ectParticipation 








SubjectId:INTEGER 


Reference to 
subject table. 




Groupld : INTEGER 


Reference to 
subject group 
table. 


tblSample 






- 


SampleId:INTEGER 


Sample 
identifier. 




SubjectID : INTEGER 


Reference to 
subject table. 




SampleSourceld : CHAR( 1 8) 


Institutional 
source of 
sample. 




Code:VARCHAR2(20) 


Code 

representing 

individual 

subject. 




Recipient: VARCHAR2(20) 


Person 

accepting 

sample. 




Provider:VARCHAR2(20) 


Person or 
institution 
providing 
sample. 
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TABLE 


FIELD 


COMMENT 




DateReceived: D ATE 


Date sample 
received. 




ProtocolId:INTEGER 


Reference to 
protocol table. 




S ampleTypeld : INTEGER 


Reference to 
sample type 
table. 


tblSampleType 








SampleTypeld: INTEGER 


Sample type 
identifier. 




Description: VARCHAR2(50) 


Description of 
sample type. 


tblSample Source 








SarnpleSourceId:CHAR(18) 


Identifier of 
institutional 
sample source. 




ProviderName:VARCHAR2(20) 


Name of ! 
individual or 
institutional 
sample 
provider. 


Item 








Itemld : INTEGER 


Item identifier. 




ItemTypeld: INTEGER 


Item type 
identifier. 




ItemName:VARCHAR2(50) 


Name of item. 
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TABLE 


FIELD 


COMMENT 


ItemDerivation 








Item lid : INTEGER 


Derivation 
source. 




Item2Id : INTEGER 


Derivation 
result. 




Employeeld: INTEGER 


Employee 
responsible for 
derivation. 




DerivationTypeId:INTEGER 


Derivation 
type identifier. 




Protocolid:VARCHAR2(18) 


Reference to 
protocol table. 




Date: DATE 


Date of 
derivation. 


tblChip 








Chipld: INTEGER 


Rename 
reference to 
item table. 




ChipDesignPlacementId:INTEGER 


Placement on 
wafer. 




LocationId:INTEGER 


Location of 
chip. 




Waferld : INTEGER 


Wafer the chip 
was on. 


tblHybedChip 
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TABLE 


FIELD 


COMMENT 




HybedChipId:INTEGER 


Rename 
reference to 
item table. 




SubjectID : INTEGER 


Reference to 
subject table. 




ProtocolId:INTEGER 


Reference to 
protocol table. 




Repetition: SM ALLINT 


Refers to 
number of 
times chip has 
been washed 
and reused. 


tblHybSampleMap 








Itemld: INTEGER 


Reference to 
item table. 


Protocol 








ProtocolId:INTEGER 


Protocol 
identifier. 




ProtocolTemplateld : INTEGER 


Protocol 
template 
identifier. 




Name : V ARCH AR2( 100) 


Name of 
protocol. 


tblScanExperiment 
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TABLE 


FIELD 


COMMENT 




ScanExptId:INTEGER 


Scan 

experiment 
identifier. 




Itemld: INTEGER 


Reference to 
item table. 




ScanCode:VARCHAR2(25) 


rue ror scan 
results. 




ProtocolId:INTEGERP 


Kererence to 
protocol table. 




ScanRatingId:INTEGER 


Assessment 01 
scan quality. 




ExperimenterId:INTEGER 


Experimenter 
identifier. 




Date: DATE 


uaie or 
experiment. 




ConversionTool:VARCHAR2(50) 


Program used 
to convert 
from scan 
image to 
intensities. 




ConversionDate : DATE 


Date of 
conversion. 




ScariStatus:VARCHAR2(50) 


whether or not 
scan image has 
been converted 
to intensities 




Comments :LONG VARCHAR 


Comments. 



27 



SUBSTITUTE SHEET (RULE 26) 



WO 99/05324 PCT/US98/15458 



1 TABLE 


FIELD 


COMMENT 








Employee 








Employeeld : INTEGER 


Employee 
identifier. 




EmployeeCode:VARCHAR2(5) 


Code for 
employee 




FName : V ARCH AR2(20) 


First name of 
employee. j 




MName: VARCHAR2(20) 


Middle name 
of employee. 




LName: V ARCH AR2(20) 


Last name of 
employee. 


ItemType 








Itemld : INTEGER 


Item type 
identifier. 




ItemTypeName: VARCH AR2(30) 


Name of item 
type. 




FormName : V ARCH AR2( 1 00) 


Reference to 
user interface 
form for item 
type. 


ItemDerivationType 








DerivationTypeId:INTEGER 


Derivation 
type identifier. 
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TABLE 


FIELD 


COMMENT 




DerivationType:VARCHAR2(50) 


Description of 

derivation 

tvne 


Item l ypeuenvauon 








NextItemTypeId:INTEGER 


Result type of 
derivation. 




TtpmTvneTrl • INTEGER 


Source type of 
derivation ' 


ItemAttribute 








itemAttributeId:INTEGER 


Item attribute 
identifier. 




ItemAttributeTypeId:INTEGER 


Reference to 
item attribute 
type table. 






Attribute 
value . 


ItemAttributeltemMap 








ItemAttributeld : INGEGER 


Reference to 
item attribute 
table. 




itemiQ.iiN i 


Reference to 


ItemAttributeTvDe 








ItemAttributetypeId:INTEGER 


Item attribute 
identifier. 
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TABLE 


FIELD 


COMMENT 




ItemAttributeName:VARCHAR2(30) 


Name of item 
attribute type. 


ItemTypeMap 








ItemAttriDUte i ypeia.iiN 1 tiuiiK 


iveicrciiLc iu 
item attribute 
type table. 




ItemTypelarIN IbOrbK 


iveierence iu 
item type 
table. 


ProtocolTemplate 








T\ a. inn _l_i.„TJ.T\TTTPFD 

ProtocolTemplateld: INTEGER 


rTOtocoi 
template 
identifier. 




XT \M DPTJ A DO/1 

Name : V ARCH AK2( 1 UU; 


iName oi 
protocol 

template. i 




DateCreated:DATE 


uaie proiocoi 

template 

created. 




FormName : VARCH AR2(50) 


Name of the 
electronic 
form used for 
protocol 
template. 


Parameter 








ParameterId:INTEGER 


Parameter 
identifier. 
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TABLE 


FIELD 


COMMENT 




ParameterTemplateldrlNTEGER 


Reference to 
parameter 
template table. 




Value:VARCHAR2(20) 


Value of 
parameter. 




ProtocolID : INTEGER 


Reference to 
protocol table. 


ParameterTemplate 








ParameterTemplateld: INTEGER 


Parameter 

template 

identifier. 




Name : VARCH AR2( 1 00) 


Name of 

parameter 

template. 




ParamTemplateSetld : INTEGER 


Reference to 
parameter 
template set 
table. 




StandardValue: VARCHAR2(100) 


Default value 
for parameter. 


ParamTemplateSet 








ParamTemplateSetld : INTEGER 


Parameter 
template set 
identifier. 
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TABLE 


FIELD 


COMMENT 




Typeld : INTEGER 


Renamed 
reference to 
parameter 
template set 
type table. 




Name : V ARCH AR2(20) 


Name of 
parameter 
template set. 


ParamTemplateSetType 








ParamTempSetTypeId:INTEGER 


Parameter 
template set 
type identifier. 




Description: VARCHAR2(50) 


Description of 
parameter 
template set 
type. 


ParameterTemplateSetMap 








ProtocolTemplateldrlNTEGER 


Reference to 
protocol 1 
template table. 




ParamTemplateSetld : INTEGER 


Reference to | 
parameter 
template set 
table. 


ProtocolTemplateDoc 
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TABLE 


FIELD 


COMMENT 




ProtocolDocId:INTEGER 


Protocol 
Template 
document 
identifier. 




ProtocolTemplateId:INTEGER 


Reference to 
protcol 

template table. 




Name:VARCHAR2(100) 


Name of 

protocol 

template. 




PathAndFileName:VARCHAR2(50) 


File name for 
protocol 
template 
document. 




AuthorName:INTEGER 


Author of 
protocol 
template 
document. 




CreationDate:DATE 


Creation Date 
of protocol 
template 
document. 


tbFragment 








FragmentId:INTEGER 


Fragment 
laentiiier. 




ChipSequence:LONG VARCHAR 


Sequence of 
fragment. 
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TABLE 


FIELD 


AK MTi. M I TWIT 

COMMEN 1 










Code:VARCHAR2(50) 


Code 

representing 
fragment. 


tblPrimerPair 








PrimerPairId:INTEGER 


Identifier for 
primer pair. 




LeftPrimerId:INTEGER 


Left primer 
identifier. j 




RightPrimerld :INTEGER 


Right primer 
identifier. 




PCRSize:INTEGER 


length of 
amplified 
fragment 




Worked : SM ALLINT 


Whether or 
not pair 
successfully 
amplified 
fragment. 




Fragmentld: INTEGER 

• 


Reference to 

fragment 

table. 


tblPCR 








ItemlldrlNTEGER 


First part of 
reference to 
item derivation 
table. 
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TABLE 


FIELD 


COMMENT 




Item2Id : INTEGER 


Second part of 
reference to 
item derivation 
table. 




Reactionworked:SMALLINT 

• 


Whether or 
not PCR 
reaction 
worked. 


PrimePairPCRMap 






- 


PrimerPairld : INTEGER 


Reference to 
primer pair 
table. 




ItemlID:INTEGER 


First part of 
referenced 
item derivation 
table. 




Item2Id : INTEGER 


Second part of 
referenced 
item derivation 
table. 


tblPrimer 








PrimerId:INTEGER 


Primer 
identifier 




ProtocolId:INTEGER 


Reference to 
protocol table. 




01igoSeq:VARCHAR2(35) 


Sequence of 
primer. 
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TABLE 


FIELD 


COMMENT 




Position: INTEGER 


Position of 
primer on 
fragment. 




LengthrlNTEGER 


Length of 
primer. 




MeltingTemp : INTEGER 


Melting 
temperature of 
primer. 




Direction: V ARCH AR2(20) 


Direction 
(forward or 
reverse). 


tblPrimerOrder 








Orderld : INTEGER 


Order 
identifier. 




Employeeld : INTEGER 


Employee who 
made order. 




VendorId:INTEGER 


Vendor for 
order. 




OrderDate:DATE 


Date of order. 




Owner: V ARCH AR2(50) 


Name of 
employee 
making order. 




Vendor:VARCHAR2(50) 


Name of 
vendor. 


tbIVendor 
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TABLE 


FIELD 


COMMENT 




Vendorld : INTEGER 


Vendor 
identifier. 




Vendor: VARCHAR2(50) 


Name of 
vendor. 




PhoneNumber:VARCHAR2(15) 


Phone number 
of vendor. 




FaxNumber:VARCHAR2(15) 


Fax Number 
of vendor. 




Address : VARCH AR2(50) 


Address of 
vendor. 




City:VARCHAR2(50) 


City of 
vendor. 




State :VARCHAR2(50) 


State of 
vendor. 




Zip:VARCHAR2(50) 


Zip code of 
venuur. 


tblPrimerOrderDesignMap 








PrimerId:INTEGER 


Reference to 
primer table. 




OrderId:INTEGER 


Reference to 
oruer lauic. 


tblWafer 








Waferld : INTEGER 


Wafer 
identifier. 




Lotld: INTEGER 


Lot to which 
wafer belongs. 
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TABLE 


FIELD 


COMMENT 




Code:VARCHAR2(8) 


Code for 
wafer. 




SynthesisDate_delete:DATE 


Synthesis date 
for wafer. 




Released:DATE 


Date wafer 
available. 




Done:SMALLINT 


Whether wafer 
production is 
complete. 










ExpirationDaterDATE — 


Expiration 
date of wafer. 




ExpectedLife : CHAR( 1 8) 


Expected 
useful life or 
wafer. 


tblLot 








Lotld: INTEGER 


Lot identifier. 




WaferDesignId:INTEGER 


Identifier for 
wafer design. 




LotNumber: VARCHAR2( 12) 


Lot number. 




Waf erPN : VARCH AR2(50) 


Part number 
for wafer. 


tblTiling Design 








TilingDesignID : INTEGER 


Tiling design 
identifier. 
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TABLE 


FIELD 


COMMENT 




ChipDesignSequenceMapID : NUMB 
ER 


Reference to 
chip design 
sequence map. 




TilingFormatID : INTEGER 


Reference to 
tiling format 
table. 




UnitNumbenlNTEGER 


1 for sense, 0 
for antisense 




AtomOff set : INTEGER 


# to add to 
translate atom 
position in 
tiling to atom 
position in 
chip design 


tblTiling Format 








TilingFormatID:INTEGER 


Tiling format 
identifier 




Orientation:CHAR(18) 


Orientation for 
tiling 




ProbeLength:SMALLINT 


Length of 
probes. 




SubstitutionPosition: SM ALLINT 


Substitution 
position for 
mutation base 
in probes. 


tblAtomDesign 
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TAUT TT 




COMMENT 




AtomDesignId:NUMBER 


Atom design 
identifier. 




TilingDesignID: INTEGER 


Reference to 
tiling design 
table. 




Position: INTEGER 


Position of 
atom in 
sequence. 


tblProbeDesign 








ProbeDesignID : NUMBER 


Probe design 




ChipDesignId:INTEGER 


Reference to 




x:SMALLINT 


x position of 

pruuc 




*/.ClV/f AT T TMT 


v nnQitinn of 

probe. 


tblProbeDesignRole 








ProbeDesignID:NUMBER 


Reference to 
probe design 
table. 




AtomDesignID : NUMBER 


Reference to 
atom design 
table. 
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TABLE 


FIELD 


COMMENT 




Substitution:CHAR(18) 


Substitution 
position in 
probe design. 




Mismatches : NUMBER 


Whether probe 
19 match or 
mismatch. 


tblProbeData 








ProbeDesignID: NUMBER 


Reference to 
probe design 
table. 




ScanExptID : INTEGER 


Reference to 
scan 

experiment 
table. 




Intensity:FLOAT 


Measured 
hybridization 
intensity for 
probe. 




NPixels ; NUMBER 


Number of 
pixels used for 
intensity 
calculation. 




StDev: NUMBER 


Standard 
deviation for 
pixels. 


tblAnalysis 
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1 AjSLjEj 








Analysisld : INTEGER 


Analysis 
identifier. 




AnalysisVersionID:INTEGER 


Reference to 
version of 




ProtocolID : INTEGER 


Reference to 
pruiucui muic 




DatePerformediDATE 


Date analysis 
performed. 




NeedsUpdate:NUMBER 


Whether 
analysis is 
current. 


tblAnalysisDependency 








ParentAnalysisId:INTEGER 


Analysis 

providing 

input. 




SubAnaiysisId:INTEGER 


Analysis 
receiving 
input. 




Role:VARCHAR2(20) 


Role of data 
provided by 
parent 
analysis. 


i DiAnaiysisinpui 








AnalysisinputID:INTEGER 


Analysis input 
identifier. 
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TABLE 


FIELD 


COMMENT 




AnalysisId:INTEGER 


Analysis 
receiving 
input. 




Inputtype:VARCHAR2(20) 


Type of input. 




ObjectID :INTEGER 


Reference to 
input data. 


tblChipDesignSequenceMa 
P 








ChipDesignSequenceMapID : NUMB 
ER 


Chip design 
sequence map 
identifier. 




FragmentID : INTEGER 


Reference to 

fragment 

table. 




ChipDesignId:INTEGER 


Chip design 
identifier. 




AtomOffsetrNUMBER 


# to add to 
translate atom 
position in 
tiling to atom 

pOMllUIl HI 

chip design 


tblSequencePosition 








SequencePosi tionlD : N UMBER 


Sequence position 
identifier. 




Chip DesignSequenceMapID : NUMBER 


Reference to chip 
design sequence 
map table. 
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TABLE 




POMMENT 




Position : NUMBER 


Position in 
fragment. 




GenoraicSequencePositionID:INTEGER 


Reference to 
genomic sequence 
position table. 




RefBase: INTEGER 


Reference base. 


tblGenomicSequencePosition 








GenomicSequencePositionID:INTEGER j 


Genomic 
sequence position 
identifier. 


tblScanExperimentSet 








ScanExperimentSetID:NUMBER 


Scan experiment 
set identifier. 


tbsScanExperimentUsed 








ScanExptID : INTEGER 


Reference to scan 
experiment table. 




ScanExperimentSetID:NUMBER 


Reference to scan 
experiment set 
table. 


tblTilingData 








TilingDatalD: NUMBER 


Tiling data 
identifier. 




ScanExptID : INTEGER 


Reference to scan 
experiment table. 




TilingDesignID:INTEGER 


Reference to 
tiling design 
table. 


tblAtomData 
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TABLE 


FIELD 


COMMENT 




AtomDatalD: INTEGER 


Atom data 
identifier. 




TilingDataID:NUMBER 


Reference to 
tiling data table. 




Subj ectSequencePositionID : INTEGER 


Reference to 
subject sequence 
position table. 


tblSubjectSequencePosition 








SubjectSequencePositionID:INTEGER 


Subject sequence 

position 

identifier. 




Subj ectID : INTEGER 


Reference to 
subject table. 




SequencePositionID: NUMBER 


Reference to 
sequence position 
table. 


tbIPolymorphismAnalysis 








AnalysisId:INTEGER 


Reference to 
analysis table. 


tblPolyPositionResult 








Analysisld: INTEGER 


Reference to 
analysis table. 




Poly PositionlD : INTEGER 


Polymorphism 

position 

identifier. 




ScanExperimentSetID: NUMBER 


Reference to scan 
experiment set 
table. 
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TABLE 


FIELD 


COMMENT 




PolyPositiontypelD: INTEGER 


Refers to 
possibility of 
polymorphism at 
position, e.g., 
certain, likely, 
nossible 
mismatch 
(reference is 
wrong). 




WTBase:CHAR(18) 


Wild type base at 
position. 




MuBase: INTEGER 


Mutation base at 
position. 


tblUserPolyanalysis 








AnalysisId:INTEGER 


Reference to 
analysis table. 


tblUserPolyanalysisResult 








Anaiy sisld : INTEGER 


Reference to 
analysis table. 




SequencePositionID:NUMBER 


Reference to 
sequence position 
table. 




ScanExperimentSetID:NUMBER 


Reference to scan 
experiment set 
table. 




Poly PositionTypelD : INTEGER 


See 

polymorphism 
position result 
table. 
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TABLE 


FIELD 


COMMENT 




UserCornment:VARCHAR2(256) 


User comment 
done 

polymorphism 
analysis. 


tblAtomanalysis 








Analysisld: INTEGER 


Reference to 
analysis table. 








tbIAtomResult 








Analysisld: INTEGER 


Reference to 
analysis table. 


- 


AtomDatalD : INTEGER 


Reference to 
atom data table. 




PHat: FLOAT 


Relative 

concentration of 
mutant and wild 
type. 




PHatUpperbound: FLOAT 


Upperbound for 

relative 

concentration. 




PHatLowerbound:FLOAT 


Lowerbound for 

relative 

concentration. 




Maxlmensity : FLOAT 


Maximum 
measured 
intensity for 
atom. 




WTIntensity:FLOAT 


Measured wild 
type intensity. 
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TABLE 


FIELD 


COMMENT 




Mutlntensity : FLOAT 


Measured 

mutation 

intensity. 




Local WTCallRate : FLOAT 


rate at which 
atoms associated 
with surrounding 
sequence call 
reference base 




IntensityRatio : FLOAT 


Ratio of intensity 
of wild type 
probe over 
intensity of 
mutation probe. 


tblBaseCallAnalysis 








Analysisld: INTEGER 


Reference to 
analysis table. 


tbIBaseCallResult 








Analysisld: INTEGER 


Reference to 
analysis table. 




SubjectSequencePositionID:INTEGER 


Reference to 
sequence position 
table. 




ScanExperimentSetID:NUMBER 


Reference to skin 
experiments set 
table. 




CalledBase: VARCHAR2( 1) 


Base called for 
subject based on 
experiment set. 
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TABLE 


FIELD 


COMMENT 




SuggestCheck:NUMBER 


Used to indicate 
whether this 
sample should be 
used for 
resequencing 


tblClusterine Analysis 








Analy sisld : INTEGER 


Reference to 
analysis table. 


tblClusteringResult 








Analysisld: INTEGER 


Reference to 
analysis table. 




SequencePositionID : NUMBER 


Reference to 
sequence position 
table. 




ScanExperimentSetlD:NUMBER 


Reference to scan 
experiment set 
table. 




ClusteringFactor: FLOAT 


Result of 
clustering 

alien y old. 










AnalysisId:INTEGER 


Reference to 
analvsi<; table 


tblNormalizationRegion 








Normal izationRegionID : INTEGER 


Normalization 
region identifier. 




AnalysisId:INTEGER 


Reference to 
analysis table. | 
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TABLE 


FIELD 


COMMENT 




ChipDesignSequenceMapID : NUMBER 


Reference to chip 
design sequence 
map table. 




NumberScanExpt.Set 


Reference to scan 
experiment set 
table. 




RegionEndrlNTEGER 


Indication of end 
of the 

normalization 
region. 




RegionStart .INTEGER 

— 


Indication of 
beginning of the 
normalization 
region. 


tblNormalizationAnalysis 








AnalysisIdilNTEGER 


Reference to 
analysis table. 


tbl NorraalizationResult 








NonnalizationResultID:INTEGER 


Normalization 
result identifier. 




AnalysisIdilNTEGER 


Reference to 
analysis table. 




TilingDatalD: INTEGER 


Reference to 
tiling data table. 




NormalizationRegionResuItID:INTEGER 


Reference to 

normalization 

result. 




NormalizationValue: NUMBER 


Value used for 
normalization. 
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TABLE 


FIELD 


COMMENT 




DataOK:NUMBER 

• 


Indication 
whether 
normalization 
result is usable. 


tblFootprintAnalysis 








Analy sisld : INTEGER 


Reference to 
analysis table. 


tblFootprint 








FootprintID:NUMBER 


Footprint 
identifier. 




AnalysisId:INTEGER 


Analysis 
identifier. 




ChipDesignSequenceMapID:NUMBER 


Reference to chip 
design sequence 
map table. 




ScanExperimentSetID:NUMBER 


Reference to scan 
experiment set 
table. 




FFStart: NUMBER 


Start of footprint 
and sequence. 




FPEnd: NUMBER 


End of footprint 
and sequence. 


tblTilingDataQuality Analysis 








Analy sisld: INTEGER 


Reference to 
analysis table. 


tbkilingDataQualityResult 








Til ingDatalD : N UMBER 


Reference to 
tiling data table. 
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TABLE 


FIELD 


COMMENT 




Analy sisld : INTEGER 


Reference to 
analysis table. 




Avg WTIntensity : NUMBER 


Average wild 
type intensity. 




WTCallRate: NUMBER 


Fraction of atoms 
where brightest 
of probes is one 
with reference 
space. 




AcceptData: INTEGER 


Whether data is 
of acceptable 
quality. 


tblDifficult Regionanalysis 









Analysis Id : INTEGER 


Reference to 
analysis table. 


tblDifficultRegionResult 








ScanExpt Id : I NTEGER 


Reference to scan 
experiment table. 




Analysisld: INTEGER 


Reference to 
analysis table. 




ChipDesignSequenceMapID:NUMBER 


Reference to chip 
design sequence 
map table. 




RgnStart:NUMBER 


Beginning of 
difficult region in 
sequence. 




RgnEnd: NUMBER 


End of difficult 
region in 
sequence. 
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TABLE 


FIELD 


COMMENT 




Reason : INTEGER 


Code indicating 
reason for 
difficult region, 
e.g., two or more 
non-wild type 
bases and less 
than a probe 
length.q 


tblPoIyPatentSeq 








Poly PatentSeqld: NUMBER 


Polymorphism 

sequence 

identifier. 


- 


Polyscreen:VARCHAR2(50) 


reference to 
internal grouping 
of polymorphisms 




FragmentCode:VARCHAR2(50) 


Fragment 
sequence found in 




PositiomLONG 


Position of 
polymorphism. 




Ref Allel : CH AR(2) 


Wild type base at 
position. 




FreqP: FLOAT 


Frequency of 
wild type. 




AltAllele:CHAR(2) 


Mutation base at j 
position. 




FreqQiFLOAT 


Frequency of 
mutation base. 




Heterozygocity :FLOAT 


Heterozygocity 
value. 
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TABLE 


FIELD 


COMMENT 




SequenceTag:VARCHAR2(50) 


Sequence 

containing 

polymorphism 

iiiduuiiie, ; 

ambiguity code at 

polymorphism 

position. 




GeneName : V ARCH AR2(50) 


Name of gene. 




ChromosomeNum:VARCHAR2(20) 


Chromosome 
number. 




ChromosomeLoc:VARCHAR2(20) 


Location of gene 
on chromosome. 




ForwardPrimer:VARCHAR2(50) 


Identifier for 

fnru/anl nrimer 

lUiwaiu ^niAiwi 

used to 

implement 

fragment. 




ReversePrimcr:VARCHAR2(50) 


Identifier of 
primer used to 
amplify fragment. 


tblPatemApp 








PatentAppId:NUMBER 


Patent application 
identifier. 




Groupld: NUMBER 


Reference to 
employee group 
table. 




Attorney Id: NUMBER 


Reference to 
attorney table. 




DocketNum:VARCHAR2(30) 


Docket number 
for patent 
application. 
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TABLE 


FIELD 


COMMENT 




FilingDate:DATE 


Filing date for 
filing application. 




CI assification ■ V ARCH AR2G0A 


Patent office 
classification for 
patent 
application. 




SerialNumber:VARCHAR2(50) 


Serial number 
assigned by 
patent office. 




CountryCode:VARCHAR2(50) 


Country in which 
patent application 
was filed. 




InventionTitle:VARCHAR2(100) 


Title for patent 
application 


tblPolyPatentSeqMap 








Patent Appld: NUMBER 


Reference to 
patent application 
table. 




i^oiyjraienioeqia.rN umdck 


Rpfprpnrf to 

polymorphism 
patent sequence 
table. 


tblPriorApp 








PriorAppId:NUMBER 


Reference to 
related prior 
patent application 
in patent 
application table. 
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TABLE 


FIELD 


COMMENT 




App Id: NUMBER 


Reference to 
aDDlication to 
which prior 
application is 
related. 


tblAttorney 








Attorney Id: NUMBER 


Attorney 
identifier. 




LawFirmld : NUMBER 


Law firm where 
attorney works. 




FirstName:VARCHAR2(20) 


First name of 
attorney. 




MiddleName:VARCHAR2(5) 


Middle name of 
attorney. 




LastName:VARCHAR2(30) 


Last name of 
attorney. 




Rf»oictmfinnNiirrvV APPH AR?/'9S , i 
IxcglSiraLlOiliNUIIi. V /\l\\^nr\M\£\4,J ) 


Patpnt office 

registration 
number of 
attorney. 


tblLawFirm 








LawFirmld: NUMBER 


Law firm 
identifier. 




Company : VARCH AR2( 1 00) 


Name of law 
firm. 




Address:VARCHAR2(100) 


Address of law 
firm. 




City:VARCHAR2(30) 


City address of 
law firm. 
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TABLE 


FIELD 


COMMENT 




State :V ARCH AR2(20) 


State address of 
law firm. 




ZipCode:VARCHAR2(15) 


Zip Code of law 
firm. 




Country:VARCHAR2(15) 


Country of law 
firm. 




Tclephone:VARCHAR2(30) j 
Fax:VARCHAR2(30) : 


Telephone 
number of law 
firm. 




TELEX : V ARCH AR2(20) 


Facsimile number 
of law firm. 






Telex number of 
law firm. 


tblEmployeeGroup 








GroupId:NUMBER 


Identifier for 
inventor group. 




GroupNarae: V ARCH AR2(50) 


Name of inventor 
group. 




Comments:VARCHAR2(50) 


Comments. 




GroupList:VARCHAR2(255) 


Written out list of 
inventor names. 


tblEmployeeGrpMap 








Employ eeld : INTEGER 


Reference to 
emolovee table 
for 

inventor/employe 
es. 




GroupId:NUMBER 


Reference to 
inventor group 
table. 
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It is understood that the examples and embodiments described herein are 
for illustrative purposes only and that various modifications or changes in light thereof 
will be suggested to persons skilled in the an and are to be included within the spirit and 
purview of this application and scope of the appended claims. For example, tables may 
be deleted, contents of multiple tables may be consolidated, or contents of one or more 
tables may be distributed among more tables than described herein to improve query 
speeds and/or to aid system maintenance. Also, the database architecture and data 
models described herein are not limited to biological applications but may be used in any 
application. All publications, patents, and patent applications cited herein are hereby 
incorporated by reference. 
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WHAT IS CLAIMED IS : 

1 1 . A computer-readable storage medium having stored 

2 thereon: 

3 an item table listing a plurality of item records identifying items; 

4 an item attribute table listing a plurality of item attribute records identifying 

5 attributes of said items; and 

6 wherein there is a many-to-many relationship between item records and item 

7 attribute records. 

1 2. The computer-readable storage medium of claim 1 

2 wherein 

3 an item attribute item map table implements said many-to-many 

4 relationship between item records and item attribute records, said item attribute 

5 item map table listing a plurality of map records identifying both a particular 

6 item attribute and a particular item. 

1 3. The computer-readable storage medium of claim 1 having 

2 further stored thereon: 

3 an item derivation table listing a plurality of item derivation 

4 records identifying transformations between ones of said items used in 

5 biological analysis. 

1 4. The computer-readable storage medium of claim 3 having 

2 further stored thereon: 

3 a protocol table listing a plurality of protocol records specifying 

4 parameters of said transformation. 

1 5. The computer-readable storage medium wherein said items 

2 are used in a biological analysis. 
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1 6. The computer-readable storage medium of claim 1 

2 wherein said biological analysis comprises a polymorphism analysis. 

1 7. A computer-readable storage medium having stored 

2 thereon: 

3 an atom result table listing a plurality of atom result records, 

4 specifying relative wild-type and mutant sequence concentrations in targets; and 

5 a subject sequence position table listing a plurality of subject sequence position 

6 records, specifying combinations of subjects from whom said targets are derived 

7 and sequence positions, each said atom result record being associated with one 

8 or more atom result records. 

1 8. The computer-readable storage medium of claim 7 

2 wherein said atom result records further specify upper and lower bounds for 

3 said concentrations. 

1 9. The computer-readable storage medium of claim 7 having 

2 further stored thereon: 

3 a subject table listing subject records specifying said subjects. 

1 10. A computer-readable storage medium having stored 

2 thereon: 

3 a polymorphism table listing polymorphism sequence records 

4 specifying sequences known to contain polymorphisms; and 

5 a patent application table listing patent application records 

6 specifying one or more polymorphisms specified by said polymorphism 

7 sequence records. 

1 11. The computer-readable storage medium of claim 10 

2 wherein said polymorphism sequence records specify for each one of said 

3 polymorphisms a polymorphism position, a reference allele, and a base allele. 
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1 12. The computer-readable storage medium of claim 11 

2 wherein said polymorphism sequence records further specify for each one of 

3 said polymorphisms a measured heterozygocity. 

1 13. A computer- implemented method comprising: 

2 creating n item table listing a plurality of item records identifying items used in 

3 biological analysis; and 

4 creating an item attribute table listing a plurality of item attribute 

5 records identifying attributes of said items; and 

6 wherein there is a many-to-many relationship between item records and item 

7 attribute records. 

1 14. The computer- implemented method of claim 13 further 

2 comprising the step of: 

3 creating an item attribute item map table implements said many- 

4 to-many relationship between item records and item attribute records, said item 

5 attribute item map table listing a plurality of map records identifying both a 

6 particular item attribute and a particular item. 

1 15. The computer-implemented method of claim 13 

2 comprising: 

3 an item derivation table listing a plurality of item derivation 

4 records identifying transformations between ones of said items used in 

5 biological analysis. 

1 16. The computer-implemented method of claim 15 further 

2 comprising: 

3 creating a protocol table listing a plurality of protocol records 

4 specifying parameters of said transformation. 
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1 17. The computer-implemented method of claim 13 wherein 

2 said biological analysis comprises a polymorphism analysis. 

1 18. A computer-implemented method comprising: 

2 creating an atom result table listing a plurality of atom result records, specifying 

3 relative wild-type and mutant sequence concentrations in targets; and 

4 creating a subject sequence position table listing a plurality of subject sequence 

5 position records, specifying combinations of subjects from whom said targets 

6 are derived and sequence positions, each said atom result record being 

7 associated with one or more atom result records. 

1 19. The computer-implemented method of claim 18 wherein 

2 said atom result records further specify upper and lower bounds for said 

3 concentrations. 

1 20. The computer-implemented method of claim 18 further 

2 comprising: 

3 creating a subject table listing subject records specifying said 

4 subjects. 

1 21. A computer-implemented method comprising: 

2 creating a polymorphism table listing polymorphism sequence 

3 records specifying sequences known to contain polymorphisms; and 

4 creating a patent application table listing patent application records 

5 specifying one or more polymorphisms specified by said polymorphism 

6 sequence records. 

1 22. The computer-implemented method of claim 21 wherein 

2 said polymorphism sequence records specify for each one of said 

3 polymorphisms a polymorphism position, a reference allele, and a base allele. 
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23. The computer-implemented method of claim 22 wherein 
said polymorphism sequence records further specify for at least one of said 
polymorphisms a measured heterozygocity. 
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